Areal interpolation is the process where the attributes of one set of polygons are interpolated to a different set of polygons. There are many applications for this process, for example, the interpolation of census data from census tracts to neighborhoods, or the interpolation of aggregated environment data from one set of geographic boundaries to another.
Tobler, a component of the Python Spatial Analysis Library (PySAL) is a package of Python functions designed to perform and support areal interpolation. Visit Tobler's website and github page for the official documentation.
In this Jupyter notebook tutorial we will use Tobler to perform two different methods of areal interpolation on two different scales of source data. In Part 1 we will use areal weighted and dasymetric methods to interpolate 2016 Canadian census data from Ottawa census tracts and dissemination areas to Ottawa neighborhoods. In part 2 we will do same but with synthetic population data instead of real data. By using this synthetic dataset we will know exactly what the results should be and thus more accurately assess the effect of the methods and scale.
The intended audience of this tutorial are students who have some prior experience using Python, JupyterLab, Jupyter Notebooks, Pandas, and GeoPandas. That being said, the code should work as-is, providing all students with the opportunity to follow along through the steps of the analysis. If you are new to Jupyter notebooks and need help running code cells or performing other basic operations then check out Project Jupyter's introductory tutorials. I have included links to the different packages' documentation throughout the tutorial and feel free to insert or split cells at any point in order to inspect any intermediate data.
Disclaimer: The tutorial and all related documents have been created as a school assignment for Carleton University's GEOM 4008 – Advanced Topics in Geographic Information Systems course. Please forgive any errors, whether simple typographic ones or more critical errors in logic or syntax. Whether you happened upon this document through its github repository or Carleton University's Open Source GIS Tutorials page and want to propose any changes, feel free to submit a pull request to this tutorial's github repository or find me at my Twitter account.
This tutorial requires the use of a number of modules. Tobler, Pandas, GeoPandas, and Plotly will perform most of the work but see the comments below for why we are importing each.
# Import modules
# --------------
import geopandas as gpd # For operations on GeoDataFrames
from IPython.display import display_html # To display DataFrames side-by-side
import json # For plotting geometries with Plotly
import matplotlib.pyplot as plt # General purpose plotting
import numpy as np # Statistical functions and classes
import pandas as pd # For operations on DataFrames
import plotly.express as px # For interactive plotting
import tobler # For the areal interpolation functions
from scipy import stats # For the linear regression function
from shapely.geometry import Point # For the point geometry class
Before performing any analysis we need read the data files from our data/ directory and assign them to variables. We will do this with GeoPandas' read.file() method.
# Read the data
# -------------
# City of Ottawa census tracts data
ct_gdf = gpd.read_file(filename='data/ottawa_ct_pop_2016.gpkg')
# City of Ottawa dissemination areas data
da_gdf = gpd.read_file(filename='data/ottawa_da_pop_2016.gpkg')
# City of Ottawa neighborhoods data
nbhd_gdf = gpd.read_file(filename='data/ottawa_neighborhoods.gpkg')
We can use Pandas' df.sample() method to inspect a small random samples of the census tracts, dissemination areas, and neighborhoods GeoDataFrames (while ignoring their geometry columns). This will show us the column names and give us some examples of how the values of each are formatted. Because the output of cells in Jupyter notebooks are rendered in HTML we can use Pandas' DataFrame styling methods to display several DataFrames side-by-side. While this takes more code than simply using the print() function or displaying the results of sample(), it facilitates comparisons and saves vertical space.
# Compare random samples from each GeoDataFrame
# ---------------------------------------------
# Assign a sampling of each GeoDataFrame to a DataFrame
df1 = ct_gdf.drop(columns='geometry').sample(5)
df2 = da_gdf.drop(columns='geometry').sample(5)
df3 = nbhd_gdf.drop(columns='geometry').sample(5)
# Style the DataFrames and include captions with the number of rows
style = "style='display:inline; padding:10px'"
df1_styled = df1.style.set_table_attributes(style).set_caption(f'Census Tracts, {len(ct_gdf)} rows')
df2_styled = df2.style.set_table_attributes(style).set_caption(f'Dissemination Areas, {len(da_gdf)} rows')
df3_styled = df3.style.set_table_attributes(style).set_caption(f'Neighborhoods, {len(nbhd_gdf)} rows')
# Display the three DataFrames
display_html(df1_styled._repr_html_() + df2_styled._repr_html_() + df3_styled._repr_html_(), raw=True)
| ctuid | ct_pop_2016 | |
|---|---|---|
| 6 | 5050037.00 | 8283 |
| 183 | 5050161.04 | 5320 |
| 146 | 5050125.03 | 527 |
| 136 | 5050033.01 | 5106 |
| 56 | 5050101.00 | 3001 |
| dauid | ctuid | da_pop_2016 | |
|---|---|---|---|
| 1369 | 35060738 | 5050160.02 | 465 |
| 444 | 35061896 | 5050171.09 | 2501 |
| 1025 | 35061323 | 5050135.02 | 697 |
| 1237 | 35060610 | 5050136.01 | 376 |
| 1307 | 35060944 | 5050001.06 | 668 |
| nbhd_name | nbhd_pop_est | |
|---|---|---|
| 97 | Dunrobin | 5569 |
| 27 | Hunt Club East - Western Community | 10784 |
| 51 | Vanier North | 9634 |
| 12 | Laurentian | 9654 |
| 40 | Sandy Hill | 11365 |
From this we can see that the census tracts and neighborhoods have roughly similar population values but the dissemination areas are quite a bit smaller.
For a quick spatial inspection let's plot the geometries of each GeoDataFrame next to each other using Matplotlib's subplots and GeoPandas' plot() methods. This will reveal the scale and shape of the geometries that we are working with.
# Plot the target and source geometries
# -------------------------------------
# Create subplots; set their size and padding
fig, axs = plt.subplots(nrows=1, ncols=3, figsize=(25, 25))
plt.subplots_adjust(wspace=0)
# Plot the Ottawa census tracts figure
ct_gdf.plot(ax=axs[0], color="none", linewidth=0.25)
axs[0].set_title('Source 1: Ottawa Census Tracts')
axs[0].axis('off')
# Ottawa dissemination areas figure
da_gdf.plot(ax=axs[1], color="none", linewidth=0.25)
axs[1].set_title('Source 2: Ottawa Dissemination Areas')
axs[1].axis('off')
# Ottawa neighborhoods figure
nbhd_gdf.plot(ax=axs[2], color="none", linewidth=0.25)
axs[2].set_title('Target: Ottawa Neighborhoods')
axs[2].axis('off')
(389723.8668414538, 485020.5102585437, 4976456.050692046, 5045514.737618738)
The following Pandas and GeoPandas methods are great ways to inspect DataFrames and GeoDataFrames.
df.info(): Information about the DataFrame/GeoDataFrame such as column names, data types, and lengthdf.head(): First n rows of the DataFrame/GeoDataFrame (defaults to 10 rows)df.tail(): Last n rows of the DataFrame/GeoDataFrame (defaults to 10 rows) df.sample(): Random sample of n rows of the DataFrame/GeoDataFrame (defaults to 1 rows)gdf.explore(): Interactive map of a GeoDataFramegdf.crs: Coordinate reference system information of a GeoDataFrameThe DataFrame (df) methods also work on GeoDataFrames (gdf) but explore() only works with GeoDataFrames. Follow their links to see the documentation as their outputs can be controlled by passing different arguments to them.
The cell below contains these data inspection methods. Just replace df or gdf before the dot notation with the DataFrame or GeoDataFrame's name that you want to inspect. For example:
ct_gdf.info() - shows info about the census tracts GeoDataFrameda_gdf.explore(column='da_pop_2016') - displays an interactive map of the dissemination areas with a colormap of the population columnAlternatively, feel free to insert or split cells at any point in this tutorial in order to inspect intermediate data. These methods will really help to reveal exactly what is going on.
# Data inspection methods
# ----------------
# Uncomment individual lines below and replace `df` or `gdf` with the name of the variable
# DataFrame (df) methods work on GeoDataFrames (gdf) but not the other way around
# df.info()
# df.head()
# df.tail(5)
# df.sample(15)
# gdf.explore()
# gdf.crs
Now that we have an understanding of the data we're working with we can move on to the area weighted and dasymetric interpolation.
Area weighted interpolation works by intersecting the source and target geometries, calculating the areal weights (i.e. the proportion a source geometry is covered by each given target geometry), and then allocating a proportional amount of the variable from the source to the targets. Please refer to Tobler's documentation of the tobler.area_weighted.area_interpolate() function for a mathematical description and help with the parameters.
The first thing to consider is whether the variable to be interpolated is extensive or intensive. An extensive variable is one that depends on the size of the sample (e.g. population counts) while an intensive variable does not (e.g. population density). In this tutorial we will only be working with extensive variables (population counts) but it's worth noting that Tobler's interpolation methods can handle both types through the use of different parameters.
As mentioned, the areal interpolation operation will be performed twice: first with the census tracts as the source data, then with the dissemination areas. This will allow us to compare the results of these different source geometries to see if there's any benefit gained by using smaller census geographies (i.e. dissemination areas) as the source.
To use the tobler.area_weighted.area_interpolate() function we will pass source and target GeoDataFrames to it along with the column name of the extensive variable. The extensive variable will then be interpolated from the source geometry to the target geometry. The result will be a GeoDataFrame containing the target geometries and interpolated values of the extensive variable. After running these cells there will be no output but the results will be saved to the ct_area_interp_gdf and da_area_interp_gdf variables. These will be plotted once all four interpolation operations have been completed.
It's important to note that these three spatial dataset have already been reprojected into the WGS 84 / UTM 18N (EPSG:32618) CRS. If you use Tobler with your own GeoDataFrames it's recommended to explicitly reproject the data into an appropriate UTM zone beforehand, even though Tobler will try to do this automatically. Reprojecting a GeoDataFrame is easily accomplished with the gdf.to_crs() method.
# Area weighted interpolation: census tracts to neighborhoods
# -----------------------------------------------------------
ct_area_interp_gdf = tobler.area_weighted.area_interpolate(source_df=ct_gdf,
target_df=nbhd_gdf,
extensive_variables=['ct_pop_2016'])
# Round the interpolation results to the nearest integer and change the type to integer
# (Population counts must be integers)
ct_area_interp_gdf['ct_pop_2016'] = ct_area_interp_gdf['ct_pop_2016'].round(decimals=0).astype(int)
# Rename the results column for clarity later
ct_area_interp_gdf = ct_area_interp_gdf.rename(columns={'ct_pop_2016':'ct_area_interp_est'})
# Area weighted interpolation: dissemination areas to neighborhoods
# -----------------------------------------------------------------
da_area_interp_gdf = tobler.area_weighted.area_interpolate(source_df=da_gdf,
target_df=nbhd_gdf,
extensive_variables=['da_pop_2016'])
# Round the interpolation results to the nearest integer and change the type to integer
# (Population counts must be integers)
da_area_interp_gdf['da_pop_2016'] = da_area_interp_gdf['da_pop_2016'].round(decimals=0).astype(int)
# Rename the results column for clarity later
da_area_interp_gdf = da_area_interp_gdf.rename(columns={'da_pop_2016':'da_area_interp_est'})
Dasymetric interpolation, or masked area weighted interpolation, incorporates secondary information to help refine the spatial distribution of the variable. For example, the extent of census tracts may include large portions of land that are uninhabited (e.g. parks, fields, water, etc.). Secondary information about the census tracts' land cover is used to mask out those uninhabited areas which don't contain the extensive variable. Area weighted interpolation is then applied to the masked census tracts and the results should be more accurate than simple area weighted interpolation on its own. This can be done by clipping the census tracts' using another set of appropriate polygons(e.g. land cover polygons, municipal zoning, etc.) and then running the areal interpolation function, but Tobler includes a function to do this automatically using a land cover raster as the secondary information. Please refer to Tobler's documentation of the tobler.dasymetric.masked_area_interpolate()](https://pysal.org/tobler/generated/tobler.dasymetric.masked_area_interpolate.html#tobler.dasymetric.masked_area_interpolate) function for help with the parameters.
The land cover raster that we will use comes from the 2015 Land Cover of Canada. This 30 m spatial resolution land cover product is produced every 5 years and covers the whole country. Tobler's dasymetric interpolation function will automatically clip the raster to the extent of the source data (City of Ottawa) but I have already clipped it to reduce the file-size and processing time.
The path of the land cover raster and the pixel value(s) of the land cover classes that contain dwellings are passed to the function along with the source and target GeoDataFrames, and the column name of the extensive variable. The result is a GeoDataFrame made up of the target geometries with interpolated values for the extensive variable.
For the same reason as before, we will perform the dasymetric interpolation twice: first with the census tracts as the source data, and then with the dissemination areas. masked_area_interpolate() will throw some warnings but just ignore them. The length of time it takes to run each of these cells on my computer has been included for your reference.
# Dasymetric interpolation: census tracts + urban landover to neighborhoods
#--------------------------------------------------------------------------
# Perform dasymetric interpolation
ct_dasy_interp_gdf = tobler.dasymetric.masked_area_interpolate(source_df=ct_gdf,
target_df=nbhd_gdf,
raster='data/ottawa_landcover.tif',
codes=[17],
extensive_variables=['ct_pop_2016'])
# Round the interpolation results to the nearest integer and change the type to integer
# (Population counts must be integers)
ct_dasy_interp_gdf['ct_pop_2016'] = ct_dasy_interp_gdf['ct_pop_2016'].round(decimals=0).astype(int)
# Rename the results column for clarity later
ct_dasy_interp_gdf = ct_dasy_interp_gdf.rename(columns={'ct_pop_2016':'ct_dasy_interp_est'})
# ~30 s ; ignore warnings
/srv/conda/envs/notebook/lib/python3.10/site-packages/tobler/area_weighted/area_interpolate.py:614: UserWarning: `keep_geom_type=True` in overlay resulted in 172 dropped geometries of different geometry types than df1 has. Set `keep_geom_type=False` to retain all geometries res_union_pre = gpd.overlay(source_df, target_df, how="union") /srv/conda/envs/notebook/lib/python3.10/site-packages/tobler/area_weighted/area_interpolate.py:617: UserWarning: The CRS for the generated union will be set to be the same as source_df. warnings.warn(
# Dasymetric interpolation: dissemination areas + urban landover to neighborhoods
#------------------------------------------------------------------------
# Perform dasymetric interpolation
da_dasy_interp_gdf = tobler.dasymetric.masked_area_interpolate(source_df=da_gdf,
target_df=nbhd_gdf,
raster='data/ottawa_landcover.tif',
codes=[17],
extensive_variables=['da_pop_2016'])
# Round the interpolation results to the nearest integer and change the type to integer
# (Population counts must be integers)
da_dasy_interp_gdf['da_pop_2016'] = da_dasy_interp_gdf['da_pop_2016'].round(decimals=0).astype(int)
# Rename the results column for clarity later
da_dasy_interp_gdf = da_dasy_interp_gdf.rename(columns={'da_pop_2016':'da_dasy_interp_est'})
# ~1.5 mins : ignore warnings
/srv/conda/envs/notebook/lib/python3.10/site-packages/tobler/area_weighted/area_interpolate.py:614: UserWarning: `keep_geom_type=True` in overlay resulted in 289 dropped geometries of different geometry types than df1 has. Set `keep_geom_type=False` to retain all geometries res_union_pre = gpd.overlay(source_df, target_df, how="union") /srv/conda/envs/notebook/lib/python3.10/site-packages/tobler/area_weighted/area_interpolate.py:617: UserWarning: The CRS for the generated union will be set to be the same as source_df. warnings.warn(
Because the Ottawa Neighbourhood Study data came with a population estimate for each neighborhood (nbhd_pop_est) we can compare those original estimated values against the interpolated values. That being said, it's important to note that the accuracy of this assessment will be very limited for a number of reasons:
Despite these issues we will go ahead and use these estimates to assess the interpolation results. We will do the following:
nbhd_pop_est) as a map# Merge the results for comparison
#---------------------------------
# Create a list of the GeoDataFrames (dropping the redundant geometry)
dfs = [nbhd_gdf,
ct_area_interp_gdf.drop(columns='geometry'),
ct_dasy_interp_gdf.drop(columns='geometry'),
da_area_interp_gdf.drop(columns='geometry'),
da_dasy_interp_gdf.drop(columns='geometry')]
# Concatenate the GeoDataFrames
interp_results_gdf = pd.concat(objs=dfs, axis=1)
# View a sample of the interpolation results GeoDataFrame (without the geometry column)
interp_results_gdf.sample(10).drop(columns='geometry')
| nbhd_name | nbhd_pop_est | ct_area_interp_est | ct_dasy_interp_est | da_area_interp_est | da_dasy_interp_est | |
|---|---|---|---|---|---|---|
| 102 | Brookside - Briarbrook - Morgan's Grant | 14940 | 15014 | 15636 | 15073 | 15668 |
| 59 | Glen Cairn - Kanata South Business Park | 9152 | 7688 | 11297 | 9532 | 11533 |
| 107 | Cardinal Creek | 10535 | 9584 | 10741 | 10976 | 11161 |
| 19 | Civic Hospital-Central Park | 10980 | 14733 | 11546 | 11192 | 10816 |
| 17 | Chapel Hill North | 10798 | 10921 | 10601 | 10724 | 10718 |
| 42 | South Keys - Greenboro West | 4221 | 6807 | 5911 | 5164 | 5049 |
| 45 | West Centretown | 9856 | 10735 | 10756 | 10050 | 10059 |
| 30 | Katimavik - Hazeldean | 11800 | 12521 | 13331 | 12164 | 12331 |
| 73 | Carleton University | 19 | 794 | 1347 | 379 | 434 |
| 92 | Metcalfe | 5240 | 6553 | 5846 | 5401 | 5409 |
Looking at this sample we can see that the different interpolation methods and source data have yielded results that look roughly in line with the neighborhood population estimates that came from the Ottawa Neighbourhood Study data (nbhd_pop_est).
Let's continue with our evaluation while not forgetting that the neighborhood population estimates have some issues. We will begin by defining some functions to calculate percent error, root mean square error, normalized square error, mean bias error, mean absolute error, and r value. There are 3rd party modules that contain functions to perform these calculations but defining our own is a good exercise.
# Define functions to assess the results
# --------------------------------------
# Percent error
def percent_error(estimated, expected):
''' Return Percent Error where estimated and expected are numeric or array-like'''
return abs(((estimated - expected) / expected) * 100)
# Root mean square error
def rmse(estimated, expected):
''' Return Root Mean Square Error where estimated and expected are array-like'''
return np.sqrt(np.mean((estimated - expected) ** 2))
# Normalized root mean square error based on standard deviation
def nrmse(estimated, expected):
''' Return Normalized Root Mean Square Error where estimated and expected are array-like'''
return np.sqrt(np.mean((estimated - expected) ** 2))/np.std(estimated)
# Mean bias error
def mbe(estimated, expected):
''' Return Mean Bias Error where estimated and expected are array-like'''
return np.mean(estimated - expected)
# Mean absolute error
def mae(estimated, expected):
''' Return Mean Mean Absolute Error where estimated and expected are array-like'''
return np.mean(abs(estimated - expected))
# r value
def r_value(estimated, expected):
''' Return r value where estimated and expected are array-like.'''
# Get the r_value by unpacking the results from SciPy's linregress() function
slope, intercept, r_value, p_value, std_err = stats.linregress(estimated, expected)
return r_value
We will now use those functions to generate statistics on the four sets of interpolation results. The percent error values are concatenated to the interp_results_gdf so we can plot maps showing the percent error of each neighborhood. The other statistics are placed into their own DataFrame so we can assess them in a tabular format.
# Calculate statistics on the interpolation results
# -------------------------------------------------
# Assign the interpolation results columns to their own variables for clarity
ct_area_est = interp_results_gdf['ct_area_interp_est'] # Source: census tracts | method: area weighted
ct_dasy_est = interp_results_gdf['ct_dasy_interp_est'] # Source: census tracts method: dasymetric
da_area_est = interp_results_gdf['da_area_interp_est'] # Source: dissemination areas | method: area weighted
da_dasy_est = interp_results_gdf['da_dasy_interp_est'] # Source: dissemination areas | method: dasymetric
expected = interp_results_gdf['nbhd_pop_est'] # Source: neighborhoods
# Create new columns in interp_results_gdf to represent percent error
interp_results_gdf['ct_area_interp_%_error'] = round(percent_error(ct_area_est, expected), 1)
interp_results_gdf['ct_dasy_interp_%_error'] = round(percent_error(ct_dasy_est, expected), 1)
interp_results_gdf['da_area_interp_%_error'] = round(percent_error(da_area_est, expected), 1)
interp_results_gdf['da_dasy_interp_%_error'] = round(percent_error(da_dasy_est, expected), 1)
# Create a DataFrame containing the other statistics
interp_stats_df = pd.DataFrame(data={'Interpolation Method': ['Area Weighted','Area Weighted',
'Dasymetric', 'Dasymetric'],
'Source Geographies': ['Census Tracts', 'Dissemination Areas',
'Census Tracts', 'Dissemination Areas'],
'RMSE': [rmse(ct_area_est, expected),
rmse(da_area_est, expected),
rmse(ct_dasy_est, expected),
rmse(da_dasy_est, expected)],
'NRMSE': [nrmse(ct_area_est, expected),
nrmse(da_area_est, expected),
nrmse(ct_dasy_est, expected),
nrmse(da_dasy_est, expected)],
'MBE': [mbe(ct_area_est, expected),
mbe(da_area_est, expected),
mbe(ct_dasy_est, expected),
mbe(da_dasy_est, expected)],
'MAE': [mae(ct_area_est, expected),
mae(da_area_est, expected),
mae(ct_dasy_est, expected),
mae(da_dasy_est, expected)],
'r': [r_value(ct_area_est, expected),
r_value(da_area_est, expected),
r_value(ct_dasy_est, expected),
r_value(da_dasy_est, expected)],
'r2': [r_value(ct_area_est, expected)**2,
r_value(da_area_est, expected)**2,
r_value(ct_dasy_est, expected)**2,
r_value(da_dasy_est, expected)**2]}).round(decimals=2)
# Display the DataFrame containing the statistics
interp_stats_df
| Interpolation Method | Source Geographies | RMSE | NRMSE | MBE | MAE | r | r2 | |
|---|---|---|---|---|---|---|---|---|
| 0 | Area Weighted | Census Tracts | 3048.00 | 0.54 | 604.50 | 1725.00 | 0.85 | 0.72 |
| 1 | Area Weighted | Dissemination Areas | 1694.34 | 0.31 | 604.48 | 926.21 | 0.96 | 0.92 |
| 2 | Dasymetric | Census Tracts | 2109.58 | 0.37 | 604.51 | 1291.68 | 0.93 | 0.87 |
| 3 | Dasymetric | Dissemination Areas | 1495.32 | 0.26 | 604.46 | 851.56 | 0.97 | 0.95 |
From these statistics it's pretty clear that, when comparing the results from the same source data, the dasymetric method is more effective than the area weighted method. This is shown by the lower error values (as represented by the RMSE, NRMSE, and MAE) and higher correlations (r values) achieved by the dasymetric method.
Interestingly, these statistics also show that using the relatively small dissemination areas as the source data yields the lowest errors and highest correlations - regardless of the interpolation method. This makes sense, as the dissemination areas are much smaller than the census tracts, giving a higher spatial resolution of data for the re-aggregation of the extensive variable from the source to the target geometries. The 'masking' approach that the dasymetric method uses can be seen as an attempt to mimic a higher spatial resolution source dataset, in both cases improving the results compared to the simple area weighted method.
Positive MBE values for all four interpolation results show that all four interpolation results are coming out higher than the neighborhood population estimates. As previously discussed, this is due to a higher total population in the census data (934243) than in the neighborhood data (867146). It's too bad these neighborhood estimates aren't closer to the census data as that would help to make this assessment more credible.
Let's see if a visual assessment of the results line up with this statistical one. Plotly offers a very powerful set of plotting tools which render spatial and non-spatial data as interactive figures. There are simpler ways to plot this with less code but the interactive nature of these Plotly figures is really great for exploratory data analysis. Before creating the Plotly choropleth facet plots the data needs to be reworked into a specific format. The geometries have be supplied in the GeoJson format and the values have to be melted into a single column with an additional column providing labels.
Check out these links for information on the Pandas df.melt() method and Plotly choropleth and facet plots:
# Plot the Percent Error of the four results using a Plotly Facet Map
# --------------------------------------------------------------------------------
# Convert interp_results_gdf from GeoDataFrame to GeoJson
geojson = interp_results_gdf.to_crs(epsg='4326').to_json()
geojson = json.loads(geojson)
# Reformat the interp_results_gdf for the plot
df = pd.DataFrame(data=interp_results_gdf.drop(columns='geometry'))
# Rename the results columns as they will become the facet plot labels
df = df.rename(columns={'ct_area_interp_%_error':'Source: Census Tracts <br> Method: Area Weighted',
'ct_dasy_interp_%_error':'Source: Census Tracts <br> Method: Dasymetric',
'da_area_interp_%_error':'Source: Dissemination Areas <br> Method: Area Weighted',
'da_dasy_interp_%_error':'Source: Dissemination Areas <br> Method: Dasymetric',})
# Combine all the results columns into one column labeled with a method column
df = df.melt(id_vars='nbhd_name',
value_vars=['Source: Census Tracts <br> Method: Area Weighted',
'Source: Census Tracts <br> Method: Dasymetric',
'Source: Dissemination Areas <br> Method: Area Weighted',
'Source: Dissemination Areas <br> Method: Dasymetric'],
var_name='method',
value_name='Error (%)')
# Create the Plotly Express choropleth figure
fig = px.choropleth(data_frame=df,
title="Areal Interpolation of 2016 Census Population Data to Ottawa, ON Neighborhoods",
locations='nbhd_name',
geojson=geojson,
featureidkey='properties.nbhd_name',
color='Error (%)',
facet_col='method',
facet_col_wrap=2,
hover_name='nbhd_name',
range_color=[0,100],
color_continuous_scale='Inferno',
#color_discrete_sequence= px.colors.sequential.Plasma,
projection='mercator',
fitbounds="locations",
height=800, width=800)
fig.update_layout(mapbox_style="open-street-map")
fig.for_each_annotation(lambda a: a.update(text=a.text.split('=')[-1]))
fig.show()